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I. Introduction 


Spectral multigrid methods [9,13,17,18] combine the accuracy of spectral discretizations 
with the efficiency and flexibility of multigrid solution techniques. To date they have 
been implemented in an exclusively two-dimensional setting, with applications to elliptic 
model problems [9,17,18] and to compressible, potential flows [13]. In this paper, spec- 
tral multigrid methods are extended to three-dimensional periodic problems and applied 
to the large-eddy simulation of turbulent flow. This work represents one realization of 
the prospective large-scale applications of spectral multigrid methods that were discussed 
in [19]. 

In section 2, the concept of large-eddy simulation is introduced and the discretized 
Helmholtz equations are formulated. Section 3 discusses several multigrid algorithms suit- 
able for Poisson and Helmholtz equations. Numerical results on the model problems are 
discussed in section 4. Finally, the multigrid algorithms developed in the previous sections 
are incorporated into the full time-dependent turbulence simulation. Numerical results are 
given in section 5. 


II. Incompressible homogeneous turbulence 

Large-eddy simulation (LES) models the small spatial scales of a turbulent flow as a func- 
tion of the large scale variables [2,5,10,11,12,15]. A spatial filter applied to the velocities 
produces the large-scale velocities from which the small spatial scales have been removed. 
The validity of LES rests on the assumption that the small scale statistics are insensitive to 
geometry away from solid boundaries. Inasmuch as this criterion is satisfied, a good LES 
model is applicable to a wide variety of configurations. Alternatively, the Reynolds aver- 
aged Navier-Stokes equations result from time averaging the Navier-Stokes equations [14]. 
The resulting perturbation velocities, the difference between the true velocities and the 
averaged velocities, become the velocity fluctuations in time and contain information at 
all spatial scale lengths. Consequently, Reynolds averaged turbulent models are expected 
to be more geometry dependent than LES models. 

In non-dimensional form, the conventional Navier-Stokes equations are given by 

dv , _ 

— + V • (uv) = -Vp+ V • i/Vv (1) 

V • v = 0 (2) 

where v is the velocity vector, p is the static pressure, and u, the kinematic viscosity, is 
assumed to be constant. 

Any flow variable 7 can be spatially filtered in the following manner: 

7(x) = [ G(x — z, A) 7 [z)d 3 z (3) 
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where G is a filter function, A is the computational mesh size, and D is the domain of 
the fluid. It follows that Eq. (3) substantially reduces the amplitude of the high-frequency 
spatial Fourier components of any flow variable 7. Consequently, 7 can be more accurately 
termed the large-scale part of 7. 

The turbulent fields are decomposed into their large and small scale components based 
on the prescription 

7 = 7+7' ( 4 ) 

where 7' is the velocity representative of the small spatial scales. The direct filtering of 
the momentum equation yields 

dv 
dt 


where 


V-(vv) = -Vp+ V-i/Vw + Vt 

(5) 

V-v = 0 

(6) 

(v k Wi - V k Vi + V^Vt + v\v k + u^v,') 

■. This tensor can be decomposed into 

(7) 

L k i = -{v k v t - VkVt) 

(8) 

c ki = -{v' k vi + v\v k ) 

(9) 

1 

II 

(10) 


which are respectively, the subgrid-scale Leonard, cross, and Reynolds stresses [6]. 

The deviatoric, i.e. trace-free, part of the subgrid-scale Reynolds stress tensor, pR , is 
approximated by the Smagorinsky model 


dRh = u E D S kl 

where u E is the velocity dependent eddy viscosity 

u E {i 7) = 2 C R A 2 /4 /2 

with _ _ 

_ _ l f dv k dv ^ 

) 


( 11 ) 


( 12 ) 


( 13 ) 

( 14 ) 


2 dxi dx k 

= S mnS mn 

(i.e., S is the Favre filtered rate of strain tensor while IIj is its second invariant) and 
Cr is the Smagorinsky constant (the Einstein summation convention for repeated indices 
is assumed.) The cross and Leonard subgrid-scale stresses are approximated with the 
Bardina model [1]. Together with the subgrid-scale Reynold stresses, one obtains the 
linear combination model [1] 


— —d{vicVi — v k vi ) + v E dSiu). 


( 15 ) 
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For purposes of numerical computation, the subgrid-scale stresses are partitioned into 
the subgrid-scale Reynolds stress and the remaining terms. The latter terms contain no 
derivatives of velocity, and are therefore treated explicitly along with the advection term. 

Substitution of the subgrid-scale stress (15) into Eq. (1) transforms the momentum 
equation into _ 

dv .. . 

_ + V.(^) = - VP + Viu + u E )Vv + V-(L + C) (16) 

where the isotropic components of the total subgrid-scale stress have been lumped together 
with the pressure to produce a new pressure variable, P , defined by 

P = p + iLkk + /Cfcfc + iPkk- (17) 

The subscript I indicates that only the trace of the tensor is considered. 

For the isotropic turbulence problem, equations (16) and (6) are solved in a cubic 
computational domain, periodic in all three spatial directions. Fourier spectral methods 
are an established approach to this problem [10,12]. The solution is obtained in two steps. 
In the first step, the convective terms and the Leonard and cross subgrid-scale stresses 
are solved explicitly while the viscous terms are treated with an implicit algorithm. In 
the second step, a Poisson equation is solved for the pressure to insure that the velocity 
field remains divergence-free. For convenience, the overbars are removed hereafter from 
the primitive variables, and it is understood that the variables refer to spatially averaged 
quantities. For a first-order time discretization, the first step thus solves 

v* = xT - A t{u xv + V-( d L + D C )]" + AtV-(i/ + (18) 

Note that the momentum equation is used in rotation form. The pressure therefore acquires 
the additional term l/2|tT| 2 . As a result of the first step, one obtains an intermediate 
velocity field v** which serves as initial conditions for the correction stage 


tT +1 = iT - A t VP n+1 

(19) 

V-t7” +1 = 0 

(20) 


For spectral collocation algorithms, a direct solution of Eq. (18) is not feasible because 
the matrices, which represent the diffusion operators, are full. The alternative which is 
discussed here is to use iterative methods, and in particular spectral multigrid (SMG) 
methods, for the solution. The second step in the splitting algorithm can be transformed 
into a Poisson equation with constant coefficients which is solved exactly in Fourier space. 
In practice, both the explicit terms are solved with a third order Runga-Kutta algorithm, 
while the implicit diffusion terms are approximated with a Crank-Nicholson scheme. The 
implicit equations are solved at each of the three Runga-Kutta stages. 
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III. Spectral Representation 


When the solution to a numerical problem is approximated by a truncated series of ap- 
propriate global basis functions, the solution is said to have a spectral representation. 
The method of projecting the solution onto the basis function space determines the type 
of spectral approximation: Galerkin, tau or collocation. Spectral methods are explained 
thoroughly in [4,9] and a summary of their applications to fluid dynamics is provided in [7]. 
In this paper, only collocation methods are considered, since they are better suited to the 
solution of non-linear and variable coefficient problems. Periodicity further restricts us to 
a Fourier representation of the primitive variables. 

Consider the three dimensional periodic function u(r) on the domain [0, 2tt] 3 and its 
truncated Fourier representation 


N z / 2 Ny/2 N,/2 

= E E E 

k z =-N z / 2+1 k,=-N,/ 2+1 k.=-N./2+l 


,i(k z x+k y i/+k z z) 


The superscripts on u, henceforth omitted, refer to the set of collocation points 
fih = {xj,yk,zi) = ( jh x ,kh y ,lh z ), 0 < j < N x , 0 < k < N v ,0 < l < N z . 


( 21 ) 


( 22 ) 


where h = ( h x ,h v ,h z ) = (2n / N x ,2w / N y ,2n / N z ) . The number of collocation points in the 
x,y,z directions are respectively {N x ,N y ,N z ). To insure spectral accuracy, u{r) must be a 
C°° function. The function u, evaluated at the collocation points m,n,p, and the Fourier 
coefficients Uk z ,ky,k, are related through the pair of discrete Fourier transforms 


N z /2 Ny/2 


N./ 2 


£ £ 


E Uk z ,ky,k z e i{klXm+k ’ Vn+k '* p) . 

(23) 

=-N z /2+l ky=-Ny/2+l k. 

=—N z /2+l 


1 Nx 

X, 

N . 


• k ’ ~ N N N ^ 
z m=0 

£ 

\ p~ + + 

/ ^ u 'm,n,p c • 

(24) 

n= 0 

v - o 



First and second derivatives of u are simply obtained by differentiating Eq. (21) term 
by term and evaluating the result at the collocation points. For example, the first and 
second derivatives in the x-direction are 


du 

dx 


N z /2 Ny/2 N./2 

= E E E ikxUk z ,ky,k z e i{kxXm+k ' Vn+k ' tf) 

m,n,p k z =-N z /2+l ky=-N y / 2+1 k,=-N z /2+l 


and 


d 2 U 

dx 2 


m,n,p 


N z /2 

E 


k z =—N z /2+l 


Ny/2 

£ 


fcy — — JVy /2“f* 1 


k z =-N./ 2+1 


(25) 


( 26 ) 
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Derivatives in the y and z directions have similar expressions. When evaluating first 
derivatives, the highest mode in the direction the derivative (the N/2 mode) is removed 
since it makes a purely imaginary contribution to the first derivative at the collocation 
points. 

In many problems, one is required to solve large systems of equations on fine grids. 
However, direct methods are often impractical because of the size of the problem, and 
standard iterative methods have very slow convergence rates. Typically, the high frequency 
components of the error damp out quickly, while there is a very slow decay of the error 
on the larger scale lengths. Such relaxation schemes smooth out the error very quickly. 
Multigrid methods accelerate the convergence of iterative methods by recognizing that low 
frequency errors on a fine grid become high frequency errors on a coarse grid. Therefore, 
the smoothed residual is interpolated onto a coarser grid, and a new set of equations, similar 
to the original set, is solved. The coarsening process is continued until a sufficiently coarse 
grid is reached on which a direct solution procedure is relatively inexpensive. From the 
coarsest grid solution, a solution on the next finer grid is obtained by prolongation of the 
coarse grid correction onto the next finer grid, optionally followed by several relaxation 
sweeps to eliminate the high frequency errors introduced by the interpolation process. In 
general, therefore, multigrid algorithms have three components: a restriction operator to 
transfer residual information from the finer to coarser grids, a prolongation operator to 
extend a coarse grid correction to the next finer level, and a smoothing algorithm whose 
objective is to reduce the high frequency components on a given level. There exists an 
extensive literature on multigrid algorithms. Several good review papers appear in [8]. 

Spectral multigrid distinguishes itself from other types of multigrid approaches in the 
choice of the interpolation and prolongation operators. In the problem considered here, all 
functions are periodic. This leads to the preferred truncated Fourier series representation. 
Following [17], interpolation of a variable from a fine to coarse grid consists of the following 
steps. Transform the variable to Fourier space, reject the highest modes not resolvable on 
the coarse grid, and transform back to physical space on the coarse grid. Prolongation 
is done in a similarly straightforward manner. After transforming the variable to Fourier 
space, additional terms are added to the Fourier series with zero coefficients. The newly 
defined function is then transformed back to physical space on the fine grid. Contrary 
to the more popular interpolation methods used in the finite-difference context, which 
always introduce high frequency components into the solution, the spectral interpolation 
just described is exact for solutions to the constant coefficient Helmholtz equation. 

Fast Fourier transform (FFT) methods permit the basic interpolation calculations to 
be performed in 0(N 3 log N) floating point operations, when the number of nodes in 
all three directions is equal N x = N y = N z = N. The grid transfer operators and the 
residual calculation are both based on FFT’s, the former to interpolate variables between 
different grids, and the latter to perform first and second derivative evaluations at the 
grid points. Therefore the overall multigrid scheme has an operation count proportional 
to 0{N Z log N). 
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A. Relaxation Scheme 


The use of FFT’s to calculate the residual restricts the choice of relaxation schemes to si- 
multaneous relaxation schemes such as Jacobi and Richardson. These relaxation schemes 
are implemented in physical space. Consider the constant coefficient scalar Poisson equa- 
tion 

V 2 u = f{r). (27) 

The Richardson scheme is one of the simplest smoothers. Applied to Eq. (27), the solution 
after one smoothing step becomes 

u *— u — wr, (28) 

where r is the residual /— V 2 u and oj is the relaxation parameter. Both stationary (fixed oj) 
and non-stationary (variable u>) are considered. Equation (28) admits a Fourier analysis. If 
a single three-dimensional Fourier mode ( j,k,l ) is substituted into Eq. (27), the smoothing 
rate, n, becomes 

H{0) = |1 - w(fc 2 + k] + k\)\ (29) 

where u> is the relaxation parameter. In the context of multigrid methods, the objective 
is to minimize the smoothing rate of the high frequencies seen by the fine grid, and not 
resolvable on the coarser ones. Given an existing grid, the next level of coarsening is 
obtained by defining the set of collocation points 0 2 /»- The range of wavenumbers over 
which the minimization is performed is the difference between the two cubes (in wave 
number space) [0, y] 3 and [0, y] 3 . Strictly speaking, because both the mean mode and the 
N/2 mode have been filtered out of the right hand side, /(r), the wave numbers considered 
for the minimization should actually be in the region defined by the difference between the 
cubes [l, y] 3 and [l, y] 3 . 


A straightforward calculation leads to an optimal smoothing rate of 


H = 


N d -\ 

N d +\ 


(30) 


for the Richardson iteration scheme, where N d is the number of spatial dimensions. When 
N d = 3, 7? = 11/13 « .85. It is obvious from Eq. (30) that the asymptotic smoothing rate 
increases with increasing spatial dimension. For example, as the number of dimensions 
increases from 1 to 3, fl increases from 0.6 to 0.85. For 3-D problems, the optimal relaxation 
parameter is 

32 

(jJ = 

13AT 2 



Operators that are spectrally discretized have a wider spread of eigenvalues than their 
finite-difference counterparts. For example, the optimal Richardson smoothing rate for a 
second-order, central difference discretization of the constant coefficient Poisson equation 
is 


Pfd ~ 


Nd + j 


(32) 
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In contrast to the spectral smoothing rates, n FD ranges from 0.33 to 0.71 for 1, 2 and 3-D 
problems. 


Brandt, Fulton and Taylor [3] applied the residual averaging technique to accelerate 
the convergence of Richardson’s smoothing algorithm for two-dimensional Fourier repre- 
sentations. The extension to the present three-dimensional problem is straightforward. 
The smoothing algorithm now satisfies 


^ mnp 


AlT A 

Umnp jy2 ^ r mnp 


(33) 


where (m,n,p) is the grid point to which the smoother is applied and A is the averaging 
template. A is the three-dimensional array 


( 6 7 6 ^ 


( 1 P 1 ) 


/ 6 7 6 \ 

1 Pi 

5 

pap 

5 

1 Pi 

6 1 6 ) 


'I P 1 J 


8 7 6 J 


(34) 


If elements of A are denoted by -A,-,*, the three arrays in the above expression correspond 
(from left to right) to fc = 1,2,3. The parameter 6 is associated with the 8 comer points 
of the cube centered at (m, n,p) about which the averaging is being performed. In con- 
ventional notation, the averaging template A applied to the residual at (m,n,p) produces 
the expression 

-A r m,n,p == [o' + P 53 "b7 J3 5Z ] r m+»,n+j,p+fc' (^5) 

M+W+W=l l<i+lil+l*l=2 M+lil+W=» 

Such a scheme is called weighted residual averaging (WRA). A Fourier analysis applied to 
Eq. (33) yields 

4tt 

n(k x ,lc v ,k z -,a,p,i,6) = \1 - —{k 2 x + + k 2 z )[a + 2P(cos0 x + cos0 y + cos0 z ) 

+47(cos 0 y cos 0 Z + cos 0 Z cos 0 X + cos 0 X cos 0 V ) 

+85 (cos 0 X cos 0y cos 0 Z )]|. (36) 

where (0 X , 0 y , 0 Z ) is a shorthand notation for ^{k x ,k y ,k z ). The solution to the minimax 
problem 

p= min ma x fi(0 x ,0 y ,0 z \a,P,^,6) (37) 

<*,/?, n i,o 

yields the optimum parameters a,P , 7 , and 6 as well as the smoothing rate 7 1 . This is solved 
numerically. The angles lie in the region formed by the difference between the two cubes 
[0, tt/ 2] 3 and [0, vr/4] 3 . To demonstrate the importance of all four averaging coefficients, 7 1 
was calculated by successively increasing the number of non-zero coefficients. The results 
are presented in table 1 where a comparison is made with the 2-D results of Brandt, Fulton 
and Taylor [3]. In both 2-D and 3-D, JZ decreases substantially with the help of residual 
averaging. As expected, the minimum 3-D spectral radius is higher than the optimal 2-D 
value. 
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2-D n 
0.777 
0.472 
0.106 


Table 1: Optimal averaging parameters and smoothing rates for weighed residual 
averaging scheme. 


a 

P 

7 

6 

3-D n 

0.062 

0.000 

0.0 

0.0 


0.101 

0.145 

0.0 

0.0 

0.608 

0.120 

0.287 

0.0083 

0.0 

0.453 

0.144 

0.042 

0.0185 

0.0085 

0.195 


B. Variable Coefficient Poisson Equation 

The analysis of the previous section is exact for the constant coefficient Poisson equation. 
More generally, one wishes to solve 

V-a(r)Vu = f(r) (38) 

where a(r) is a strictly positive C°° function. Although convergence is no longer theoret- 
ically guaranteed, good results are obtained if the residual is first divided by a(r) before 
averaging. Alternatively, one can use Eq. (33) after replacing 47T 2 /N 2 by 4n 2 /a(r)N 2 . 
Both approaches yield similar results. The results presented herein are based on the for- 
mer method. 


C. Helmholtz Equation 

With the good convergence rates achieved for the Poisson equation, attention is now fo- 
cussed on the three-dimensional Helmholtz equation 

V-aVu — Au = /(r) (39) 

where a(r) is a C°° function and A is a positive constant. The iteration scheme and its 
associated convergence rate are respectively 

u <— u — u (/(r) — V-oVu + Au) (40) 

and 

H = |1 - w{k 2 + A) | (41) 

where, as previously, the number of grid points is assumed to be equal in all directions 
(N x = N y = N z = N) . The optimum smoothing rate is 

11/13 

M “ 32 A 

+ 13iV 2 a 
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which is exact for constant coefficient a. (It is assumed that the k x = k v = k z = 0 mode is 
solved for exactly on the coarsest grid.) Note that the difference in smoothing rates of the 
Poisson and Helmholtz operators decreases with increasing grid size. 


Experiments were also performed with a non-stationary Richardson smoothing algo- 
rithm. If k is the condition number 

/c=^i (43) 

Amin 

of the discrete spectral operator V*aV - A, the j th relaxation parameter, u k of a k- 
parameter cycle is 

2 / Xmin 


u k = 

3 


(* - l)cos(2i_^) + (k + 1) 


j = l,...,k 


(44) 


i /} 


and the corresponding smoothing rate is 

^ = " < 45 > 

which is the solution to a standard minimax problem [16]. The range of frequencies that 
are preferentially damped are in the domain defined by the difference between the two 
cubes [0, A mai ] 3 and [0, A m , n ] s . As a function of A and of the coefficient a(r), the minimum 
and maximum eigenvalues of the discrete Helmholtz operator are 


_ aN 2 

A min „ "t~ A, 


_ 3 aN 2 

Amax — , “r A 


16 — 4 ■ ~ ( 46 ) 
and the sequence of optimal relaxation parameters in the non-stationary Richardson iter- 
ation scheme become 

32 1 


u>* = 

3 aN 2 + 16A 


(«-lW ' (i [ ( 1} ) + (« + !) 


(47) 


The number of terms in the sequence is set equal to the number of smoothing sweeps. For 
stationary Richardson, j = 1 and Eqs. (47) reduces to 

32 


. . _ 13AT 2 

w _ W 

l + 


(48) 


13JV 2 a 

while fl is given by Eq. (42) If the value of A m , n and A moa: (Eq. (46) are inserted into the 
condition number defined by Eq. (43), it is clear that the convergence rate must increase 
when either JV, or A is increased. 

Table 2 confirms that ~p decreases with increasing A and j. In practice, a 3-cycle scheme 
is sufficient to reduce the L 2 norm of the residual by a factor of 5. An alternate formulation 
of the Richardson method is obtained when the Helmholtz term is treated implicitly, i.e 

u - u>p(/(r) - V-aVu) 


u 


1 -{• A a-' p 


(49) 


I 


or equivalently 


(50) 


u <— u + — V-aVu) 

where is the implicit Helmholtz relaxation parameter 

C Op 

— - — ■ • 

1 + A u)p 

If wp is the optimum relaxation parameter for the constant coefficient Poisson operator, 
given by 

32 

UP = 13iVV ^ 

u>h is identical to the value obtained in Eq. (48). Therefore the two algorithms are identical 
for all a > 0. However they differ from one another for non-stationary Richardson. Indeed, 
in the implicit formulation, one chooses the w* to optimize the convergence of the Poisson 
operator which leads to relaxation parameters independent of A. The A dependence is 
introduced as an extra positive term in the denominator of Eq. (49). This is in contrast 
to w* given by Eq. (47) where A appears in the coefficient of the cosine function. Table 3 
illustrates the differences between the two approaches when non-stationary Richardson is 
used with cycles of varying length. The two methods give approximately identical smooth- 
ing rates, except in the limit of large A where the implicit method gives slightly better 
performance. From a practical point of view, it is cheaper to evaluate the acceleration 
parameters for the implicit scheme because a factor l/a can be factored out of w* and 
combined with the residual. This is not possible for the explicit formulas which depend on 
r. 



k 

m(A = 0) 

/z(A = 10) 

J[{ A = 50) 

1 

0.846 

0.826 

0.755 

2 

0.747 

0.720 

0.631 

3 

0.689 

0.661 

0.573 

4 

0.655 

0.628 

0.542 

5 

0.634 

0.607 

0.524 

6 

0.619 

0.593 

0.512 


Table 2: Smoothing rates for non-stationary Richardson iteration applied to the 
Helmholtz equation. 


D. Implementation 

The stationary and non-stationary relaxation schemes were implemented in a simple V- 
cycle formulation which is described in detail in [17,18] for the two-dimensional Poisson 
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32 3 

Grid Size 
64 3 

128 3 

1 

A 10 
50 

0.652/0.651 

0.628/0.618 

0.542/0.511 

0.654/0.654 

0.648/0.645 

0.621/0.610 

0.655/0.655 

0.653/0.653 

0.646/0.643 


Table 3: Comparison of convergence rates of explicit versus implicit non-stationary 
Richardson iteration algorithms. 


equation. Results are based on a fine grid of 32 s and 3 coarser grids of 16 3 , 8 3 and 4 3 . A 
constant number of relaxation sweeps on each grid on the upwards ( N u ) and downward 
(Nd) branches of the V-cycle was found to provide good overall smoothing rates for all 
the cases that were considered. An effective rule of thumb is to decrease the residual by 
an order of magnitude on each grid. This leads to Nj = 2 for WRA, because of the high 
smoothing rates, and Nj = 3 for the non-stationary Richardson (NSR) schemes. This 
corresponds to an approximate decrease of the L 2 norm of the residual on the order of 0.2 
and 0.05 for the WRA and NSR respectively (per Nj fine grid relaxations). In all tests, 
N u = 1. This is because the variable coefficient a introduces high frequencies into the 
residual after a prolongation. 

All the computations presented were performed on the Cray 2. Fast Fourier transforms 
are coded in Fortran, and achieve a 100 Mflop rate on grids on 64 3 and above. Timings 
may fluctuate by 10% — 20% for identical runs due to system load. 


E. Results 


There are many different approaches to measuring the efficiency of a multigrid algorithm. 
Ultimately, the user is interested in the total CPU clock time a code takes to complete 
execution. However, this timing is strongly computer dependent, and on a given system, 
the programmers skill can greatly influence the results. It is therefore necessary to supple- 
ment the CPU time with more intrinsic measures. The simplest measure is the smoothing 
rate Ji of the smoother on the finest grid. Unfortunately, Jl does not take into account the 
work done on the coarser grids, nor does it account for the time spent performing grid 
transfers. Alternate criteria are required to take this work into account. One method is 
to calculate the ratio of L 2 norms of the residual after and before a single V-cycle, and 
calculate a smoothing rate per V-cycle as 


H v — 


lk-«ll \ 

1 1 T entry \ \ j 


(53) 
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where Nf is the total number of fine grid sweeps during one V-cycle (7V U + N d ). Although 
Jly is still not useful as an accurate measure of efficiency since it does not take residual 
transfers into account, it can help establish whether the high frequencies seen by each grid 
are damped at the same rate. If JZ and fZ v are unequal and the number of relaxations is the 
same on every grid (except perhaps the coarsest), the frequency content of the error vector 
is unevenly distributed, and the smoothing rates will be different on each grid. Therefore, 
Jl v is useful as a diagnostic tool. 

A better measure of the overall algorithm efficiency is obtained from the number of 
equivalent fine relaxation sweeps defined as 

N CPU time per V — cycle 

eq CPU time per fine grid relaxation 

The equivalent convergence rate 



measures the decrease in the residual norm per fine sweep taking the total multigrid over- 
head into account. Together with the total CPU time per V-cycle, the performance of the 
algorithm can be ascertained. In what follows, performance is measured exclusively by /Z 
and fl T . Processing time is a function of the number of relaxation sweeps on the way up 
and down the V-cycle, and on the grid size. These parameters are kept constant within a 
given table except in table 6 where the effect of fine-grid resolution is studied. Therefore, 
the decrease in the residual norm after a fixed number of multigrid cycles is an objective 
measure of the algorithm’s efficiency. 

As explained in detail in [18], the N/2 Fourier mode must be filtered out of the residual 
every time it is computed. In one dimension, this is done in physical space by projecting 
the residual function onto the space orthogonal to (— i)\j = 1 ,N X . In higher dimensions, 
a sequence of 1-D filtering operations is performed. The filtering operation consumes 
approximately 25% of the residual calculation on a 32 3 grid. The influence of vectorization 
is clearly seen from the decrease in relative time spent filtering as the grid size increases. 
For example, as N increases from 32 to 128, the percentage of time spent filtering, measured 
with respect to the total time spent calculating the residuals (which includes the filtering), 
decreases from 25% to 13%. Equivalently, the percentage of time spent in 3-D derivative 
evaluations during the computation of the residual increases from 67% to 77% as the grid 
size increases from 32 s to 128 3 . 

When solving the Poisson equation, the mean value of the right-hand side of the equa- 
tion must be filtered out to insure convergence of the residual towards zero [18]. This is 
done once at the beginning of the calculation. 

All the numerical experiments were done with the Helmholtz equation (with A set to 
a constant value). The Poisson equation is obtained by setting A to zero. The coefficient 


12 



a(r) is set to 


a(r) = 1 + (56) 

In all cases considered, the right hand side of the Helmholtz equation is calculated to insure 
that the exact solution is 

u ex (r) = sm(N x 7T sin(:r)) sin(N v n sin(y)) sin(iV z 7r sin( 2 )) (57) 

The factors N x ,N y and N z are included to insure that the complete spectrum of spatial 
scales are equally represented in the error vector. This insures that JZ = Jl v . Such a 
solution however, precludes a direct comparison of the computed and the exact solution 
because u ex is no longer well represented by the collection of Fourier modes. 

Convergence results for the WRA are presented in table 4. The smoothing rates ob- 
tained for the constant coefficient Poisson equation are lower than the theoretical predic- 
tions. This is mainly because the analytic results presented in Table 1 are only valid in 
the limit N — * oo. Although a(r) varies by a factor 20 across the physical domain when 
c = 1, 7* remains close to the optimal value of 0.2. Experiments have shown that a large 
degradation of JZ occurs for e greater than 1.5. Seven V-cycles reduced the residual 10 to 
11 orders of magnitude. Timings indicate that the number of equivalent fine relaxations is 
3.45 on a 32 3 grid. This is larger than is expected from the sum of the work done on the 
sequence of grids obtained from summing the geometric series 1 + (|) S +(|) 3 +(^) 3 H 


6 

0.0 0.5 1.0 

1 



0.17 0.18 0.19 

0.61 0.58 0.61 

4(-ll) 3 (-10) 1(-10) 


Table 4: Rates of convergence for Poisson equation 


which leads to 1.13 equivalent work units. The discrepancy between the theoretical and 
numerical results are due to the time spent in the grid transfers which are not included in 
the geometric series, and the inefficiency of the Cray 2 program on the coarser grids. 

Large eddy simulations of turbulence require that the numerical scheme be capable of 
resolving a wide range of spatial scale lengths. Furthermore, the velocity fields have a 
quasi-random distribution over the set of scale-lengths that survive the the filtering. In 
typical large eddy simulations, the smallest fluctuations seen by the LES code are on the 
order of 4 fine grid cell widths (albeit with small amplitudes). It is therefore important 
to ascertain the influence of spatial structure in a(r) on the smoothing rate of the Poisson 
and Helmholtz operators. To this effect, the definition of the coefficient a(r) is slightly 
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modified according to 


_ 1 _|_ ggCosfn^J+cosKvJ+cosfn^)^ (5g) 

With 6 = 0.5, a varies from 1 to about 10. The influence of n £ is to introduce high frequency 
content into the coefficient without affecting its range. In other words, the higher n e , the 
smaller the distance over which a assumes its maximum variation. Table 5 shows that 
small values of n £ do not adversely affect the convergence rate of the iteration scheme. 


n e 

~P 

Hy 

1 

0.17 

0.17 

2 

0.18 

0.18 

4 

0.55 

0.55 

8 

0.58 

0.62 


Table 5: Effect of high frequency content of a(r) on the smoothing rate 


However, the rate of convergence rate quickly deteriorates for n e > 4. Note that when the 
variation of a becomes too rapid, there is a discrepancy between JZ and JZ V . This probably 
indicates that there is a frequency imbalance across the various grid levels, due to the grid 
transfer operators which do not properly interpolate a(F) onto the coarser grids. Similar 
experiments performed on the Helmholtz equation indicate that A has a beneficial effect 
on the smoothing rate as n e is increased. Of course, JZ is higher than the worst Poisson 
result since residual averaging is not allowed. 

The Helmholtz equation is numerically solved for several values of e and A. Non- 
stationary Richardson with a cycle of 3 produces the results displayed in table 6. Pairs 
of numbers correspond to (e = 0/e = 0.5). As expected, for a fixed e, JZ decreases with 
increasing A due to the reduction of the condition number 

(59) 


— aN 2 + A 
= 16 

-aN 2 + A 
4 

For the larger values of A, the effect of non-constant coefficient a is more severe. Whereas 
JZ is almost unaffected by e for A < 40, the differences in smoothing rates are substantial 
for A > 100. For instance, when A = 1000, JZ is approximately 3 times larger for e = 0.5 
than for e = 0.0. Similar trends occur for JZ T . However, the influence of e on JZ T is less 
dramatic at high A. 


(60) 


K = 


14 



A 

i 

10 

100 

1000 

wnm 


0.66/0.67 

0.49/0.62 



0.87/0.88 

0.86/0.86 



HSM 

3(-5)/l(-4) 

l(-5)/2(-5) 

5(-9)/2(-6) 

l(-ll)/2(-9) 


Table 6: Rates of convergence for Helmholtz equation 


Optimal multigrid methods produce convergence rates independent of the grid size. 
This is tested for the Helmholtz equation at e = 0.5 and A = 10. Table 7 indicates that 
both ~p and ~p T are approximately constant over fine grid sizes that range between 32 3 and 
128 s . In all cases, the coarsest grid level is 4 s . Computer timings for the calculations are 
also presented. They are normalized to 1 on the 64 3 grid. In all cases, the code is stopped 
after a fixed number of V-cycles. Increased time spent on the 128 3 grid relative to the 64 3 
agrees quite well with theoretical predictions. This is in contrast to the extra 70% CPU 
time spent on the 32 3 grid than allowed for by the 0(N log N) scaling. Poor vectorization 
on this grid is the probable cause of this discrepancy. 


grid size 

32 3 

64 3 

128 s 

V- 

0.67 

0.67 

0.68 

Pt 

0.86 

0.85 

0.85 

IM| 

TTT 

1.9(-5) 

1.5(-4) 

2.5(-5) 

CPU time (arbitrary units) 

0.18 

1.0 

9.0 

0(N 3 log N) (arbitrary units) 

0.10 

1.0 

9.3 


Table 7: Grid independence for Helmholtz equation, e = 0.5, A = 10 


F. Large Eddy Simulation 

The implicit stage of the LES equations requires the solution to the set of three scalar 
Helmholtz equations given by equation (18). Defining 

_ V + Ve{v) 

<V + Ve ( v ) > 
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2 


( 62 ) 


< v + Ve{v) > At’ 

Equ. (18) reduces to the three scalar Helmholtz equations 

V • a(v)Vu(r) — Av = tf, (63) 

where v* is the flow velocity after the explicit step. The above definitions of a(r) and 
A insure that < a >= 1, which allows the numerical results to be compared against the 
results in the previous sections. 

A major difficulty expected to reduce the efficiency of the multigrid implementation, 
when compared with the theoretical and model problem results, is that a(r) is now, in 
effect, a random function of the spatial coordinates. Numerical experiments indicate that 
the multigrid algorithm fails to converge for A below a certain threshold. In the current 
code, this threshold is A fa 100. Convergence rates and timings are shown in table 8. 
Calculations were performed on a 32 s grid. 


Although equation (63) has the same functional form as the model equation, the effect 
of a(f) and A on the overall multigrid efficiency relative to a purely explicit scheme is not 
easy to determine. One reason is the strong dependence of these parameters on the filter 
width A, kinematic viscosity i/, and Smagorinsky constant Cr. To understand better the 
interelations between these parameters and the possible gain of a multigrid strategy, let 


_ A **•// 

At adv 


(64) 


where A t adv and At*// are respectively the maximum time steps calculated for the explicit 
advection and diffusive terms. The accuracy of the simulation is mostly determined by the 
advection terms. Implicit algorithms are consequently most favorable when the diffusion 
time step is much smaller than A t adv (i.e. when Z « 1). Stated differently, a fully 
explicit solver is cheaper than a mixed explicit /implicit scheme when Z » 1 . For Fourier 
methods, the maximum advection Courant number of the third order Runga-Kutta method 
is 0.6, which leads to 

Ax 

A t adv = 0.6 . (65) 

v 

where v is a representative fluid velocity. (The estimates in this section are for a one- 
dimensional problem.) On the other hand, the diffusion time limit is 


A t d iff 


0.25 


Ax 2 

V + Ve 


( 66 ) 


In this discussion, all the variables are assumed to be constant. Equations (65) and (66) 
combine into 


Z = 


vAx 


(67) 


f(v)C R n\ Ax 2 + v 

Several substitutions have been made to arrive at Eq. (67). The filter width A has been 
replaced by n A Ax to separate the computational grid size from the actual filter width. 
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Thus, when n A is increased, the filtering is stronger, and more high frequencies are removed 
from the large-eddy velocities. The velocity dependence of the Smagorinsky model is 
included in f(v) whose magnitude is a slowly decaying function of n A . 

As confirmed in table 8, the convergence rate improves with increasing A. However, 
according to Eq. (62), a higher A is the result of a decrease in n A , Cr or u. Other sources 
of variation are not considered here. Equation (67) therefore clearly indicates that a higher 
A reduces the gain of the multigrid scheme over the purely explicit scheme. This is borne 
out by table 8. The multigrid code performs 2 times slower than the explicit scheme when 
A = 130 and 9 times slower when A = 1000 although JZ has dropped from 0.58 to 0.12. 
Furthermore, as A increases, so does R. which explains why the multigrid code performs so 
poorly (compared to the explicit scheme) for large A. The variation of the results in the 
table 8 is not uniformly monotonic as a function of A. This is partially because timings 
are a function of the load on the Cray 2, and of the interaction between the various 
independent control parameters. For example, if A is increased through a smaller value 
n A , the high frequency content in the velocities is reduced and better SMG convergence 
rates are expected, not only due to the larger Helmholtz coefficient, but also because of 
the smoother velocity fields. On the other hand, only the former cause for improvement 
is remains when C R is decreased. 

Finally, a(r) was assumed to be constant in the above analysis. In all probability, the 
oscillatory nature of a(r) will also strongly influence the performance of the SMG. When 
A drops below 100, the implicit scheme fails to converge. This might be related to the 
highly oscillatory coefficients which appear when simulating turbulent flows. 


A 

A* 

impl. code vs. expl. code 
time /step time /run 

<100 non-converged 


130 

0.58 

16 

2 

200 

0.75 

21 

2 

240 

0.26 

12 

2 

480 

0.21 

10 

3 

540 

0.20 

12 

4 

700 

0.20 

14 

7 

1000 

0.12 

12 

9 


Table 8: Numerical results from SMG incorporated into the multigrid code. Figures 
refer to 5 complete time steps. 


More efficient Helmholtz solvers must be developed before spectral multigrid algorithms 
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will strongly outperform the explicit schemes for the simulation of turbulent flows. The 
quasi-random coefficients in the LES equations also call for new spectral interpolation 
procedures to improve the robustness of the method. 


G. Conclusion 

Three dimensional periodic Poisson and Helmholtz equations have been solved with a 
3-D spectral multigrid algorithm. Convergence rates for the Poisson problem are best 
when weighted residual averaging is adopted. The spatially dependent coefficient a(r) was 
allowed to vary by more than an order of magnitude without affecting overall convergence 
rates. Although weighted residual averaging is impractical for the 3-D Helmholtz equation, 
non-stationary Richardson is a viable alternative for a wide range of a and A. At a fixed 
amplitude variation, high spatial frequency content of a had a deleterious effect on Ji for 
the Poisson equation. This is unfortunate since for turbulent simulations the variables are 
necessarily oscillatory. 

The algorithms herein were successfully incorporated into a full 3-D, non-stationary 
incompressible LES code. It was found that in the range of parameters examined, the 
SMG takes at least twice as long as an explicit calculation. This is part due to the 
relatively large spread of eigenvalues for a Fourier collocation algorithm. Another cause is 
most probably related to the strong and rapid variations of a(r) . 
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